Gamified crowdsourcing for idiom corpora construction
نویسندگان
چکیده
Abstract Learning idiomatic expressions is seen as one of the most challenging stages in second-language learning because their unpredictable meaning. A similar situation holds for identification within natural language processing applications such machine translation and parsing. The lack high-quality usage samples exacerbates this challenge not only humans but also artificial intelligence systems. This article introduces a gamified crowdsourcing approach collecting materials expressions; messaging bot designed an asynchronous multiplayer game native speakers who compete with each other while providing nonidiomatic examples rating players’ entries. As opposed to classical crowd-processing annotation efforts field, first time literature, crowd-creating & crowd-rating implemented tested idiom corpora construction. language-independent evaluated on two languages comparison traditional data preparation techniques field. reaction crowd monitored under different motivational means (namely, gamification affordances monetary rewards). results reveal that proposed powerful targeted materials, although being explicit approach, it found entertaining useful by crowd. has been shown have potential speed up construction be used material, training supervised systems, or lexicographic studies.
منابع مشابه
Interview: Acquiring Corpora using Crowdsourcing
Crowdsourcing has become one of the hottest topics in the artificial intelligence community in recent years. Its application to speech and language processing tasks like speech transcription has been very appealing-but what about creating corpora? Can we harness the power of crowdsourcing to improve training data sets for spoken language processing applications like dialogue systems? project to...
متن کاملThe GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy
Crowdsourcing is an increasingly popular, collaborative approach for acquiring annotated corpora. Despite this, reuse of corpus conversion tools and user interfaces between projects is still problematic, since these are not generally made available. This demonstration will introduce the new, open-source GATE Crowdsourcing plugin, which offers infrastructural support for mapping documents to cro...
متن کاملConstruction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features
Some phrases can be interpreted either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged natural language processing (NLP). To this end, we have constructed an idiom corpus for Japanese. This paper reports on the corpus and the results of an idiom identification experiment using the corpus. The corpus targets 146 amb...
متن کاملConstructing Parallel Corpora for Six Indian Languages via Crowdsourcing
Recent work has established the efficacy of Amazon’s Mechanical Turk for constructing parallel corpora for machine translation research. We apply this to building a collection of parallel corpora between English and six languages from the Indian subcontinent: Bengali, Hindi, Malayalam, Tamil, Telugu, and Urdu. These languages are low-resource, under-studied, and exhibit linguistic phenomena tha...
متن کاملAutomatic Corpora Construction for Text Classification
Since the machines become more and more intelligent, it is reasonable to expect the automatic construction of text classifiers by given just the objective categories. As trade-off solutions, existing researches usually provide additional information to the category terms to enhance the performance of a classifier. Unique from them, in this paper, we construct the standard corpora from the web b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Natural Language Engineering
سال: 2022
ISSN: ['1469-8110', '1351-3249']
DOI: https://doi.org/10.1017/s1351324921000401